Skip to content

Conversation

@noob-se7en
Copy link
Contributor

Problem
Related to #15682
TLDR: In latest pinot version, FreshnessChecker fails for streams like Kinesis since latest stream offset cannot be determined always.

As per Kinesis code (from software.amazon.awssdk.services.kinesis.model):

`/**
 * <p>
 * The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence number of
 * <code>null</code>.
 * </p>
 * 
 * @return The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence
 *         number of <code>null</code>.
 */
public final String endingSequenceNumber() {
    return endingSequenceNumber;
}`

When FreshnessChecker is enabler for Kinesis streams, the server status will be marked as bad.

Solution
We skip freshness check for streams not supporting offset lag.

@noob-se7en noob-se7en changed the title Fix latest stream offset Skip Freshness check for streams not supporting offset lag Jan 23, 2026
@xiangfu0 xiangfu0 requested review from Copilot and xiangfu0 January 23, 2026 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a bug where the FreshnessChecker incorrectly marks Kinesis stream consumption as unhealthy because Kinesis does not always provide ending sequence numbers for active (OPEN) shards. The solution skips freshness checks for streams that don't support offset lag calculation.

Changes:

  • Added early return logic to skip freshness checks for streams that don't support offset lag operations
  • Logs a warning when freshness check is skipped for unsupported stream providers

Comment on lines 84 to 89
if (!streamMetadataProvider.supportsOffsetLag()) {
// Cannot conclude if segment has caught up or not. Skip such segments.
_logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}",
segmentName, currentOffset);
return true;
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'Cannot conclude if segment has caught up or not' which implies uncertainty, but the code returns true (indicating the segment is caught up). This is misleading. The comment should clarify that returning true here means the segment is considered healthy/passing the freshness check by default for streams that don't support offset lag, not that it has actually caught up.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. The comment should be a bit more clear on why we are returning true here.

Copy link
Contributor

@9aman 9aman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. LGTM

Comment on lines 84 to 89
if (!streamMetadataProvider.supportsOffsetLag()) {
// Cannot conclude if segment has caught up or not. Skip such segments.
_logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}",
segmentName, currentOffset);
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. The comment should be a bit more clear on why we are returning true here.


if (!streamMetadataProvider.supportsOffsetLag()) {
// Cannot conclude if segment has caught up or not. Skip such segments.
_logger.warn("Stream provider for segment: {} does not support offset subtraction. Current offset: {}",
Copy link
Contributor

@xiangfu0 xiangfu0 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of just logging this once per table? This will be called for every consuming segment for every health check call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this. This was the existing behaviour as well.

@noob-se7en noob-se7en requested a review from xiangfu0 January 23, 2026 10:30
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.18%. Comparing base (b3b72ae) to head (6e25238).

Files with missing lines Patch % Lines
.../helix/IngestionBasedConsumptionStatusChecker.java 45.45% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17560      +/-   ##
============================================
+ Coverage     63.14%   63.18%   +0.03%     
  Complexity     1476     1476              
============================================
  Files          3172     3172              
  Lines        189783   189793      +10     
  Branches      29041    29043       +2     
============================================
+ Hits         119842   119920      +78     
+ Misses        60629    60552      -77     
- Partials       9312     9321       +9     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.12% <45.45%> (+<0.01%) ⬆️
java-21 63.16% <45.45%> (+0.04%) ⬆️
temurin 63.18% <45.45%> (+0.03%) ⬆️
unittests 63.18% <45.45%> (+0.03%) ⬆️
unittests1 55.49% <ø> (-0.01%) ⬇️
unittests2 34.06% <45.45%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0
Copy link
Contributor

Please fix test: FreshnessBasedConsumptionStatusCheckerTest

@noob-se7en
Copy link
Contributor Author

cursor messing up. fixing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants